In this section, give a brief a description of your project and its goal, what data you are using to complete it, and what three faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.
Final Project on Github: https://github.com/Dokotela/BMIN503_Final_Project
+In the first paragraph, describe the problem addressed, its significance, and some background to motivate the problem. It is well-known that obesity is a problem in our society. This is not just true for adults, but also for children. It is estimated that 18.5% of chldren 2-19 years old were overweight as of 20161. The rate of overweight is over 30%2. Hispanics and Non-Hispanic blacks had a higher rate of obesity than Non-Hispanic whites, and those in a lower socio-economic class. The outcomes of this are many, varied and can be severe, including a myriad of health problems along with a wide-range of socio-emotional consequences. Pediatricians are typically not familiar with the best ways to counsel and educate families about nutrition and exercise, and typically have enough time to do so. One possible solution is telehealth interventions. There have already been a few studies demonstrating improvement in adolescent physical activity or increase in fruit and vegetable consumption through telehealth interventions3,4.
+In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff. In order for these interventions to be effective, multiple different groups will need to be involved. Patients typically see their primary pediatrician as responsible for their health information, so it would be reasonable to base such an intervention out of an outpatient clinic. However, as above, pediatricians don’t generally have the expertise or time to do such counseling. One could involve psychologists, pediatricians, endocrinologists, eating disorder specialists, socail workers, other behavioral health specialists, coaches, and to do such a telehealth intervention would require some technicians to assist with setup, tech support and similar issues. We worked with nutritionists, pediatricians, and psychologists to develop our intervention, which questionnaires would be appropriate, and how to properly use motivational interviewing techniques. One of the concerns, and I’m not sure that we did this as well as we should have with the interventions, was to focus on health (eating nutritious foods and getting regular exercise) instead of focusing on weight in children.
+In the first paragraph, describe the data used and general methodological approach.
Patients were recruited from a local Federally Qualified Health Center. Inclusion criteria were age five through 17, inclusive, with a diagnosis of asthma (requiring medication) or overweight (greater than or equal to 85% for age). All study subjects had to be fluent in English or Spanish. Parents/Guardians were also required to be fluent in English or Spanish. This was determined by their own assessment.
Prior to the study, it underwent and gained IRB approval and also an MOU and letter of support from the clinical medical director.
The clinic staff identified patients meeting the inclusion criteria. They then asked the patient’s permission to be contacted about the study. There were also fliers in the waiting room allowing the patient to contact the study team directly, and signed a release of information to allow review of medical history pertinent to the study. If the patient gave consent to contact, they were called by the study team, given an explanation of the study, and if interested were scheduled for their initial appointment.
All patients who meet the study inclusion criteria were offered an integrated approach to chronic condition management with a specialist team (a chronic condition specialist and a behavior specialist). Initially the goal was to recruit and have two arms, one with a technology component and the other without. As recruitment was more difficult than anticipated, it was changed to a single arm study.
At the initial visit, and after the study is explained and all questions answered, a written consent was obtained in English or Spanish from parents and those patients 18 years old, and assent was obtained from those patients 7-17. These were all captured on the tablet. All forms were available in English and Spanish. The patients’ history, including medical, family, genetic considerations and social history was evaluated, including a chart review. Data from the medical charts that will be extracted include: ER and hospital visits, type, frequency and dose of medications used, blood pressure, heart rate, weight, height, diet history and exercise patterns. At the initial visit for overweight children, the patients was assessed for their behaviors and attitudes regarding eating, physical activity, sedentary time and motivation. For those patients with asthma, the pediatrician also assessed understanding of the disease. At the first and last visit there were pre- and post-intervention assessment points, adherence to medical regimen, physical activity and dietary habits that were gathered. During the 4 intervening visits in addition to the first and last, quality of life of assessed. At the final visit data was also gathered aboutt he patient and parent perceptions of the program. All surveys were completed in REDCap (internet-based, secure, encrypted, HIPAA-compliant software), whether in person or online, to allow easier data analysis.
During the course of the study, the chronic condition pediatrician and behavior specialist acted in a consult capacity for their chronic condition, but the patient continued to see their regular pediatrician for primary care needs. At every visit any changes since the last visit were reviewed, including a chart review. For overweight children the specialists provided education about what is considered healthy eating, physical activity and motivation. For children with asthma, the specialists provided education about the disease itself, including symptoms, precipitating, risk and preventive factors, its natural history and proper management. The behavior health specialists then helped the patient to develop a plan on how to follow their treatment plan and achieve healthy behaviors. Problem-solving, goal-setting, and motivational interviewing techniques were used.
Each patient enrolled in the study was given access to technology in the form of a tablet computer. The tablets were installed with software to help educate patients and their families about asthma and overweight/obesity and track progress in managing them. In addition to educational apps, each tablet was preloaded with Fuze, a secure, HIPAA compliant teleconferencing app. The first and last visits were done in person at the clinic, but the four follow-up visits were conducted as a “virtual” visit via online teleconferencing. On the date of each “virtual” visit, the study participant was called 3 times to try and meet. For this group regular service and usage fees during the course of the study was covered as part of the study and was not be billed. Each participant was required to complete and sign an Equipment Check-Out Form and an End-User agreement form from Verizon. All participants were called 1 week before each appointment, then again 72 hours before the appointment.
At each visit attended, every participant received a $10.00 gift certificate.
+Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.
#delete the test patient
data <- data[-which(data$redcap_id == 'ce0b1de0534e326798805670fd231294'),]
#not sure why this one variable is coded differently, but this fixes it
data <- data %>% mutate(pedsqlparent_01 = factor(pedsqlparent_01, levels = c("", 0, 1, 2, 3, 4), labels = c(NA, 0, 1, 2, 3, 4)))
data$pedsqlparent_01 <- as.numeric(as.character(data$pedsqlparent_01))#collect the 4 columns on language preference into 2
languages <-function(info){
info <- add_column(info, "Patient's Language" = NA, .before = "eng")
info$`Patient's Language` <- case_when(
info$eng == 1 ~ "English",
info$esp == 1 ~ "Spanish",
TRUE ~ NA_character_
)
info <- add_column(info, "Parent's Language" = NA, .before = "eng")
info$`Parent's Language` <- case_when(
info$eng2 == 1 ~ "English",
info$esp2 == 1 ~ "Spanish",
TRUE ~ NA_character_
)
info[6:9] <- NULL
info
}
#cleanup languages
data <- languages(data)
#creating some better names for the demographic variables
dem_names <- function(info){
colnames(info)[colnames(info)=="not_part_ethnicity"] <- "Ethnicity"
colnames(info)[colnames(info)=="not_part_asthma"] <- "Asthma?"
colnames(info)[colnames(info)=="not_part_gender"] <- "Gender"
colnames(info)[colnames(info)=="not_part_obese"] <- "Obese?"
info
}
#better column names
data <- dem_names(data)#recodes sections of dataframe, pass full dataframe, 1st/last column (inclusive) to recode, plus string for recoding
recoding <- function(info, col1, col2, code){
info[,colnames(select(info, col1:col2))] <- apply(info[,colnames(select(info, col1:col2))], 2, function(x) {x <- recode(x, code); x})
info
}
#call function to recode data
data <- recoding(data, "pedsqlkids_01", "pedsqlkids_23", "'0'='100'; '1'='75'; '2'='50'; '3'='25'; '4'='0'")
data <- recoding(data, "pedsqlparent_01", "pedsqlparent_23", "'0'='100'; '1'='75'; '2'='50'; '3'='25'; '4'='0'")#summarize grade levels
data$dem_grade = factor(data$dem_grade,levels=c("00","0","1","2","3","4","5","6","7","8","9","10","11","12","13"), labels = c("Pre-K","Kindergarten","1","2","3","4","5","6","7","8","9","10","11","12","Other/Otro"))
data$dem_grade <- as.character(data$dem_grade)
data$dem_grade[(data$dem_grade == "Pre-K" | data$dem_grade == "Kindergarten" | data$dem_grade == "1" | data$dem_grade == "2" | data$dem_grade == "3" | data$dem_grade == "4" | data$dem_grade == "5")] <- "<= 5th Grade"
data$dem_grade[(data$dem_grade == "6" | data$dem_grade == "7" | data$dem_grade == "8")] <- "6-8th Grade"
data$dem_grade[(data$dem_grade == "9" | data$dem_grade == "10" | data$dem_grade == "11" | data$dem_grade == "12")] <- "9-12th Grade"
#remove unused levels
data <- droplevels(data)#this takes as arguments name of first and last column to combine, then loops through, seeing which columns are empty, and collecting them all in first column, then deleting it.
cleaner <- function(string1, string2, info){
cols <- which(colnames(info)==string1):which(colnames(info)==string2) #get col numbers for first and last string
info[cols] <- sapply(info[cols], as.character)
for(j in 1:nrow(info)){
for(i in cols){
info[cols[1]][[1]][j] <- paste(if_else(is.na(info[cols[1]][[1]][j]), "", info[cols[1]][[1]][j]),
if_else(is.na(info[i][[1]][j]), "", info[i][[1]][j]), sep="") #combine the current first column with any others that aren't null
}
}
info[cols[1]][[1]] <- na_if(info[cols[1]][[1]], "") #replace all "" with NA
info[cols[-1]] <- NULL
info
}
#add columns as needed
survey_columns <- function(info, list) {
for(i in 1:length(list)){
info[[list[i]]] <- NA
}
info
}
#combining columns together
data$tech_kid_brand___1 = factor(data$tech_kid_brand___1,levels=c("0","1"), labels = c(NA,"iPhone or iPad"))
data$tech_kid_brand___2 = factor(data$tech_kid_brand___2,levels=c("0","1"), labels = c(NA,"Samsung"))
data$tech_kid_brand___3 = factor(data$tech_kid_brand___3,levels=c("0","1"), labels = c(NA,"LG"))
data$tech_kid_brand___4 = factor(data$tech_kid_brand___4,levels=c("0","1"), labels = c(NA,"HTC"))
data$tech_kid_brand___5 = factor(data$tech_kid_brand___5,levels=c("0","1"), labels = c(NA,"Motorola"))
data$tech_kid_brand___6 = factor(data$tech_kid_brand___6,levels=c("0","1"), labels = c(NA,"Other"))
data$tech_kid_brand___7 = factor(data$tech_kid_brand___7,levels=c("0","1"), labels = c(NA,"Don't Know"))
data$tech_kid_os___1 = factor(data$tech_kid_os___1,levels=c("0","1"), labels = c(NA,"iPhone or iPad"))
data$tech_kid_os___2 = factor(data$tech_kid_os___2,levels=c("0","1"), labels = c(NA,"Android/Google"))
data$tech_kid_os___3 = factor(data$tech_kid_os___3,levels=c("0","1"), labels = c(NA,"Windows"))
data$tech_kid_os___4 = factor(data$tech_kid_os___4,levels=c("0","1"), labels = c(NA,"Blackberry"))
data$tech_kid_os___5 = factor(data$tech_kid_os___5,levels=c("0","1"), labels = c(NA,"Other"))
data$tech_kid_os___6 = factor(data$tech_kid_os___6,levels=c("0","1"), labels = c(NA,"Unsure"))
data <- cleaner("tech_kid_brand___1", "tech_kid_brand___7", data)
data <- cleaner("tech_kid_os___1", "tech_kid_os___6", data)#get some row means for the survey, pass the dataframe, first and last column (inclusive) to be included
rowmeaning <- function(info, col1, col2){
store <- info[grep(paste("^", col1, "$", sep = ""), colnames(info)):grep(paste("^", col2, "$", sep = ""), colnames(info))]
info <- rowMeans(store)
info[rowSums(is.na(store)) >= 0.5] <- NA
info[which(rowSums(is.na(store)) / ncol(store) >= 0.5)] <- NA
info
}
#create subgroups for pedsql
data <- survey_columns(info = data, list = c("qlPtTotal", "qlPtPhys", "qlPtEmotion", "qlPtSocial", "qlPtSchool", "qlParTotal", "qlParPhys", "qlParEmotion", "qlParSocial", "qlParSchool"))
label(data$asthma15)="If a child dies from an asthma attack, this usually means that there was no time to start any treatment."
#calculate summary calculation for pedsql filled out by the patients
data$qlPtTotal <- rowmeaning(data, "pedsqlkids_01", "pedsqlkids_23")
data$qlPtPhys <- rowmeaning(data,"pedsqlkids_01","pedsqlkids_08")
data$qlPtEmotion <- rowmeaning(data, "pedsqlkids_09", "pedsqlkids_13")
data$qlPtSocial <- rowmeaning(data, "pedsqlkids_14", "pedsqlkids_18")
data$qlPtSchool <- rowmeaning(data, "pedsqlkids_19", "pedsqlkids_23")
#calculate summary calculation for pedsql filled out by the parents
data$qlParTotal <- rowmeaning(data, "pedsqlparent_01", "pedsqlparent_23")
data$qlParPhys <- rowmeaning(data,"pedsqlparent_01","pedsqlparent_08")
data$qlParEmotion <- rowmeaning(data, "pedsqlparent_09", "pedsqlparent_13")
data$qlParSocial <- rowmeaning(data, "pedsqlparent_14", "pedsqlparent_18")
data$qlParSchool <- rowmeaning(data, "pedsqlparent_19", "pedsqlparent_23")#make dataframe for those that completed a final visit
complete <- subset(data, data$redcap_id %in% data$redcap_id[which(data$redcap_event_name=="visit6")])
#make dataframe from above of just the first visit (to evaluate demographics)
completeFirst <- droplevels(subset(complete, complete$redcap_event_name=="visit1"))
completeLast <- droplevels(subset(complete, complete$redcap_event_name=="visit6"))
#merging the first and last dataset to make one that can be more easily compared (variables end in .x and .y)
combined <- merge(completeFirst, completeLast, by="redcap_id")
#Table with similar data as above, but formatted differently
firstLast <- complete[complete$redcap_event_name %in% c('visit1', 'visit6'),]
firstLast <- droplevels(firstLast)#making some tables of basic descriptive data
list1 <- c("redcap_id", "Obese?", "Asthma?", "dem_gender", "dem_grade", "Ethnicity", "dem_language_pref", "tech_kid_device_time", "tech_kid_brand___1", "tech_kid_os___1", "dem_language_home", "dem_parent_habitants", "dem_parent_sex", "dem_parent_ethnicity", "dem_parent_language_pref", "dem_parent_marital", "dem_parent_us", "dem_parent_income", "dem_parent_job", "dem_parent_school")
#select those columns
demotable <- completeFirst[which(colnames(completeFirst) %in% list1)][, list1]
list2 <- c("id", "Is child obese", "Does child have asthma", "Gender of child", "Grade of child", "Ethnicity of child", "Child's preferred language", "Time child spends on device", "Technology brand used", "Technology OS used", "Language spoken at home", "Number of people living at home", "Gender of parent", "Ethnicity of parent", "Parent's preferred language", "Parent's Marital Status", "How long has parent been in US", "Parent's Income", "Parent's Employment", "Parent's highest school completed")
#rename the columns something nicer for Table 1
colnames(demotable) <- list2
tablelist <- c('<table><tr>')
tablelist <- c(tablelist, '<td valign="top" width="25%"><table>')
for(i in 2:8){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }
tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 9:12){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }
tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 13:17){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }
tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 18:20){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }
tablelist <- c(tablelist, '</table></td></tr></table>')#applies ttest to column that ends in .x and .y (usually after merging two datasets)
ttest <- function(info, col){
t.test(info[[col]], info[[gsub(".x", ".y", col)]], paired=TRUE, alternative = "two.sided")
}
#trying to make easier fxn to apply ttest, applies ttest to all columns between col1 and col2 inclusive
applying <- function(col1, col2, info){
info <- as.data.frame(lapply(select(info, col1:col2, gsub(".x", ".y", col1):gsub(".x", ".y", col2)), as.numeric))
info <- t(sapply(colnames(select(info, col1:col2)), ttest, info=info)) %>%
as.data.frame() %>%
add_column(., "Name" = gsub(".x", "", row.names(.)), .before = "statistic")
}
#boxplot for significant changes from first and last visits
boxy <- function(info, name, sig, y, offset){
info[[name]] <- as.numeric(info[[name]])
text_color <- "black"
text_background <- if_else(sig[name,]$p.value <= 0.05, "red", "gray")
ggplot(info, aes_string(x = "redcap_event_name", y=name)) +
geom_boxplot() +
theme(text = element_text(size = 25), axis.text = element_text(size = 25)) +
labs(x = "", y="", title = str_wrap(label(complete[grep(name, colnames(complete))]), width=45)) +
annotate("text", x=1.5, y=y, label = paste("T-Test: t=", toString(round(as.numeric(sig[name,]$statistic), digits=5)), "\nP-value: ", toString(round(as.numeric(sig[name,]$p.value), digits=5))), color = text_color, size=7) +
annotate("rect", xmin=1.2, xmax=1.8, ymin=y-offset, ymax=y+offset, alpha=0.2, fill=text_background)
}
#get list of significant results for pedsql from patient, then plot
sig <- applying("qlPtTotal.x", "qlPtSchool.x", combined)
pqlpatientplot <- lapply(colnames(select(complete, "qlPtTotal":"qlPtSchool")), boxy, info = firstLast, sig = sig, y = 35, offset=7)
#plot list of significant results for pedsql by parent, then plot
sig <- applying("qlParTotal.x", "qlParSchool.x", combined)
pqlparentplot <- lapply(colnames(select(complete, "qlParTotal":"qlParSchool")), boxy, info = firstLast, sig = sig, y = 35, offset=7)
#get significant results for first and last visit for activity and nutrition surveys
sig <- applying("act_nutri_kids01.x", "act_nutri_kids27.x", combined)
actNutriKids <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)
#get significant results for first and last visit for activity and nutrition surveys from parents
sig <- applying("act_nutri_parent01.x", "act_nutri_parent31.x", combined)
actNutriParents <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)
#get significant results for first and last visit for weight surveys from patient
sig <- applying("weight_01.x", "weight_16.x", combined)
weightPts <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)weights=read.csv('weights.csv')
weights$date <- as.Date(as.character(weights$date))
colnames(weights)[ncol(weights)] <- "days"
weights$days <- weights$date - as.Date("2014-01-01")
prestudy <- subset(weights, date < "2015-09-01")
poststudy <- subset(weights, date >= "2015-09-01")
prestudy <- droplevels(prestudy)
poststudy <- droplevels(poststudy)
#this just groups the data into sets pre and post start of the study, then gets coefficients for linear regression
lining <- function(info) {
measures <- as.data.frame(levels(info$redcap_id))
colnames(measures) <- "redcap_id"
for(ins in levels(info$redcap_id)) {
measures$slope[which(measures$redcap_id==ins)] <- lm(weight ~ days, data=subset(info, redcap_id==ins))$coefficients[2]
measures$intercept[which(measures$redcap_id==ins)] <- lm(weight ~ days, data=subset(info, redcap_id==ins))$coefficients[1]
}
return(measures)
}
#plots all of the weights, pre and post beginning of study, along with linear regression lines for that part
plotting <- function(ins, pre, post) {
if(length(which(pre$redcap_id==ins)) > 1) {
precoeff <- lm(weight ~ days, data=subset(pre, redcap_id==ins))$coefficients
precoeffs <- c(0, precoeff[1], 608, precoeff[1]+608*precoeff[2])
} else {
precoeffs <- c(0,0,0,0)
}
if(length(which(post$redcap_id==ins)) > 1) {
postcoeff <- lm(weight ~ days, data=subset(post, redcap_id==ins))$coefficients
postcoeffs <- c(608, postcoeff[1]+postcoeff[2]*608, 1000, postcoeff[1]+1000*postcoeff[2])
} else {
postcoeffs <- c(0,0,0,0)
}
ggplot() +
geom_segment(aes(x = precoeffs[1], y =precoeffs[2], xend=precoeffs[3], yend=precoeffs[4], color = "blue4"), data=pre) +
geom_segment(aes(x = postcoeffs[1], y =postcoeffs[2], xend=postcoeffs[3], yend=postcoeffs[4], color = "firebrick1"), data=post) +
theme(legend.position="none") +
theme(axis.title.x=element_blank()) +
theme(axis.title.y=element_blank())
}
#collect the groups, get linear regression lines, store coefficients
measures <- merge(lining(prestudy), lining(poststudy), by = "redcap_id", all.y=TRUE)
measures$group <- if_else(grepl("Control", measures$redcap_id), "control", "study")
studyprepost <- subset(measures, group=="study")
controlprepost <- subset(measures, group=="control")
#create graphs for patient weights, fit segments of trend lines to weights before and after study started
wtplots <- lapply(levels(weights$redcap_id), plotting, pre=prestudy, post=poststudy)## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
#create dataframe of demotable
pts <- as.data.frame(lapply(demotable[,2:ncol(demotable)], as.factor)) %>%
add_column(., "ID" = demotable$id, .before = "Is.child.obese")
#function to define distribution of categorical data
distro <- function(info, name){
dis <- as.data.frame(prop.table(table(info[[name]]))) #creates a table of values in column, creates prop.table from this
sample(dis$Var1, 10, replace = TRUE, prob=c(dis$Freq)) #creates a random sample based on the variable frequency
}
#set number of samples for creating data
samples <- 500
sim <- lapply(colnames(pts[,2:ncol(pts)]), distro, info = pts) #create new data, put it into sim
sim.data <- data.frame(id=1:samples) #make new dataframe
for(i in 1:length(sim)){
sim.data <- cbind(sim.data, sim[[i]]) #put all data from list sim in dataframe
}
colnames(sim.data) <- colnames(pts)
sim.data2 <- sim.data
sim <- lapply(colnames(pts[,2:ncol(pts)]), distro, info = pts) #create new data, put it into sim
sim.data <- data.frame(id=1:samples) #make new dataframe
for(i in 1:length(sim)){
sim.data <- cbind(sim.data, sim[[i]]) #put all data from list sim in dataframe
}
colnames(sim.data) <- colnames(pts)
bars <- function(info, name){
ggplot(info, aes_string(name)) +
geom_bar()
}
sim.data2 <- sim.data
lapply(colnames(pts[,2:ncol(pts)]), bars, info = pts)## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
#need to see how many na there are to be able to perform randomforest plots
pts$na_count <- apply(pts, 1, function(x) sum(is.na(x)))
#clean up dataframe to perform forest plot
pts <- pts[pts$na_count < 10,] %>% #remove all rows with > 10 NAs
select(.,-c(colnames(.)[ apply(., 2, anyNA) ])) %>% #remove all columns with any NAs
.[1:length(.)-1] %>% #remove final column (sum of number of NAs)
droplevels(.)
#pts$names <- as.factor(c("Timmy", "Tommy", "Tammy", "Britney", "Bobby", "Bobbie", "Bobbee", "Boby", "Tom", "Tim", "Babs", "Barbra", "Barbie", "Barb", "Brad"))
# sim.data <- cbind(sim.data, as.data.frame(source('randomnames.r')[1]))
# colnames(sim.data) <- c(colnames(sim.data)[1:ncol(sim.data)-1], "name")
#this one works
pts.rf <- randomForest(as.factor(ID) ~ ., data = pts, ntree = 1000, importance = TRUE)
pts.rf.pred <- predict(pts.rf, pts, type = "prob")
#this one doesn't, can't have a predictor with more than 53 categories
sim.rf <- randomForest(as.factor(ID) ~ ., data = sim.data, ntree = 1000, importance = TRUE)
sim.rf.pred <- predict(sim.rf, sim.data, type = "prob")Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
|
|
|
|
#perform ttests on the different groups
t.test(studyprepost$slope.x, studyprepost$slope.y, alternative = "two.sided")$p.value## [1] 0.149661
## [1] 0.2752352
## [1] 0.5451327
## [1] 0.1349553
studyprepost$diff <- studyprepost$slope.y - studyprepost$slope.x
controlprepost$diff <- controlprepost$slope.y - controlprepost$slope.x
t.test(studyprepost$diff, controlprepost$diff, alternative = "two.sided")##
## Welch Two Sample t-test
##
## data: studyprepost$diff and controlprepost$diff
## t = 0.23032, df = 8.6634, p-value = 0.8232
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.06584312 0.08067238
## sample estimates:
## mean of x mean of y
## -0.01730112 -0.02471575
#display the plots created above of the patients survey information
gridExtra::grid.arrange(grobs = pqlpatientplot, ncol=2) #patients pedsql survey results over 6 visits#plot the weight changes for each participant and control
gridExtra::grid.arrange(grobs = wtplots, ncol=4)